28 research outputs found

    Detection of algorithmically generated malicious domain names using masked N-grams

    Get PDF
    Malware detection is a challenge that has increased in complexity in the last few years. A widely adopted strategy is to detect malware by means of analyzing network traffic, capturing the communications with their command and control (C&C) servers. However, some malware families have shifted to a stealthier communication strategy, since anti-malware companies maintain blacklists of known malicious locations. Instead of using static IP addresses or domain names, they algorithmically generate domain names that may host their C&C servers. Hence, blacklist approaches become ineffective since the number of domain names to block is large and varies from time to time. In this paper, we introduce a machine learning approach using Random Forest that relies on purely lexical features of the domain names to detect algorithmically generated domains. In particular, we propose using masked N-grams, together with other statistics obtained from the domain name. Furthermore, we provide a dataset built for experimentation that contains regular and algorithmically generated domain names, coming from different malware families. We also classify these families according to their type of domain generation algorithm. Our findings show that masked N-grams provide detection accuracy that is comparable to that of other existing techniques, but with much better performance

    Support vector machines framework for linear signal processing

    Get PDF
    This paper presents a support vector machines (SVM) framework to deal with linear signal processing (LSP) problems. The approach relies on three basic steps for model building: (1) identifying the suitable base of the Hilbert signal space in the model, (2) using a robust cost function, and (3) minimizing a constrained, regularized functional by means of the method of Lagrange multipliers. Recently, autoregressive moving average (ARMA) system identification and non-parametric spectral analysis have been formulated under this framework. The generalized, yet simple, formulation of SVM LSP problems is particularized here for three different issues: parametric spectral estimation, stability of Infinite Impulse Response filters using the gamma structure, and complex ARMA models for communication applications. The good performance shown on these different domains suggests that other signal processing problems can be stated from this SVM framework.Publicad

    Randomized Machine Learning Approaches: Recent Developments and Challenges

    No full text
    Randomness has always been present in one or other form in Machine Learning (ML) models. The last few years have seen a change of role in the use of randomness, which is no longer a specific and accessory improvement in very particular aspects of a model, but the main theoretical basis that supports some ML methods, e.g., the well-known random forests. In the Neural Network (NN) area, since its origins, randomness gave rise to a rich set of models, which have been recently exploited especially for efficiency aims. However, the bias induced by the use NN with random weights deserves further analysis, especially in the novel advances in the fields of deep NNs, dynamical systems (Recurrent NN), and NNs for learning in structured domains

    Integration of Behavioral Economic Models to Optimize ML performance and interpretability: a sandbox example

    Full text link
    This paper presents a sandbox example of how the integration of models borrowed from Behavioral Economic (specifically Protection-Motivation Theory) into ML algorithms (specifically Bayesian Networks) can improve the performance and interpretability of ML algorithms when applied to Behavioral Data. The integration of Behavioral Economics knowledge to define the architecture of the Bayesian Network increases the accuracy of the predictions in 11 percentage points. Moreover, it simplifies the training process, making unnecessary training computational efforts to identify the optimal structure of the Bayesian Network. Finally, it improves the explicability of the algorithm, avoiding illogical relations among variables that are not supported by previous behavioral cybersecurity literature. Although preliminary and limited to 0ne simple model trained with a small dataset, our results suggest that the integration of behavioral economics and complex ML models may open a promising strategy to improve the predictive power, training costs and explicability of complex ML models. This integration will contribute to solve the scientific issue of ML exhaustion problem and to create a new ML technology with relevant scientific, technological and market implications

    Robust g-filter using support vector method

    No full text
    This Letter presents a new approach to time series modelling using the support vector machines (SVM). Although the g filter can provide stability in several time series models, the SVM is proposed here to provide robustness in the estimation of the g filter coefficients. Examples in chaotic time series prediction and channel equalization show the advantages of the joint SVM g filter.Publicad

    Kernel methods for HyMap imagery knowledge discovery G. Camps-Valls a, L. Gómez-Chova a, J. Calpe-Maravilla a,

    No full text
    In this paper, we propose a kernel-based approach for hyperspectral knowledge discovery, which is defined as a process that involves three steps: pre-processing, modeling and analysis of the classifier. 1 Firstly, we select the most representative bands analyzing the surrogate and main splits of a Classification And Regression Trees (CART) approach. This yields three datasets with different reduced input dimensionality (6, 3 and 2 bands, respectively) along with the original one (128 bands). Secondly, we develop several crop cover classifiers for each of them. We use Support Vector Machines (SVM) and analyze its performance in terms of efficiency and robustness, as compared to multilayer perceptrons (MLP) and radial basis functions (RBF) neural networks. Suitability to real–time working conditions, whenever a preprocessing stage is not possible, is evaluated by considering models with and without the CART-based feature selection stage. Finally, we analyze the support vectors distribution in the input space and through Principal Component Analysis (PCA) in order to gain knowledge about the problem. Several conclusions are drawn: (1) SVM yield better outcomes than neural networks; (2) training neural models is unfeasible when working with high dimensional spaces; (3) SVM perform similarly in the four classification scenarios, which indicates that noisy bands are successfully detected and (4) relevant bands for the classification are identified

    Support vector machines for crop classification using hyperspectral data

    No full text
    Abstract. In this communication, we propose the use of Support Vector Machines (SVM) for crop classification using hyperspectral images. SVM are benchmarked to well–known neural networks such as multilayer perceptrons (MLP), Radial Basis Functions (RBF) and Co-Active Neural Fuzzy Inference Systems (CANFIS). Models are analyzed in terms of efficiency and robustness, which is tested according to their suitability to real–time working conditions whenever a preprocessing stage is not possible. This can be simulated by considering models with and without a preprocessing stage. Four scenarios (128, 6, 3 and 2 bands) are thus evaluated. Several conclusions are drawn: (1) SVM yield better outcomes than neural networks; (2) training neural models is unfeasible when working with high dimensional input spaces and (3) SVM perform similarly in the four classification scenarios, which indicates that noisy bands are successfully detected.
    corecore